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Abstract 

We wish to discriminate spike sequences based on the degree of irregularity. For this purpose, 
we search for a rational expressions of quadratic functions of consecutive interspike intervals that 
efficiently measures spiking irregularity. Under natural assumptions, the functional form of the 
coefficient can be parameterized by a single parameter. The parameter is determined so as to 
maximize the mutual information between the distributions of coefficients computed for spike 
sequences derived from different renewal point processes. We find that the local variation of 
interspike intervals, Ly (Neural Comput. Vol. 15, pp. 2823-42, 2003), is nearly optimal for whose 
intrinsic irregularity is close to that of experimental data. 

PACS numbers: Valid PACS appear here 
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I. INTRODUCTION 



It is important to extract as much information as possible from spike sequences when 
looking for correlations between animal behaviors and neuronal activities pi 0, 0, Q] or 
controlling prosthetic apparatuses by neuronal activities yj. In many cases, however, only 
the mean firing rate is considered and the timing information is not taken into account. 
Consideration of detailed temporal structure of each spike train would help to decode brain 
signals more efficiently We would like to propose a measure, which augments the information 
provided by the mean firing rate. 

Coefficients that are functions of the interspike intervals (ISIs) are effective in detecting 
a spiking irregularity from a short spike train. For instance, the coefficient of variation, Cy, 
is widely adopted as a measure of the variance of ISIs (| RQ, ^ ■ Recently, a measure of the 
local variation of interspike intervals, Ly, was proposed [10], as a natural extension of Cy2 
which was designed to detect a stepwise variation of consecutive ISIs jllj . An analysis using 
Ly revealed that in vivo spike sequences are not uniformly random, but possess specific 
characteristics that vary among individual neurons. In addition, it was found that the 
neocortex consists of heterogeneous neurons that differ not only from one cortical area to 
another, but also from one layer to another in their spiking patterns [12']. 

In the present study, we try to modify Ly in an attempt to find a better measure for 
discriminating spike sequences based on the degree of irregularity. Namely, we examine ra- 
tional expressions of quadratic functions of consecutive interspike intervals for suitability as 
coefficients for measuring spiking irregularity. Under reasonable assumptions, the functional 
form of the coefficient is found to be parameterized by a single parameter. The parameter 
is determined so as to maximize the mutual information between the distributions of co- 
efficients computed for finite size sample sequences derived from different renewal gamma 
processes. It is found that Ly is not optimal for nearly random Poisson spike trains but 
optimal for more regular spike trains. 

In Sec. m we explain how we generated spike sequences with the same firing rate but 
different intrinsic irregularity. We show that a gamma distribution suffices for that purpose 
and that two parameters in the gamma distribution can be chosen as orthogonal coordinates. 
In Sec. IHII we explain Ly and compare it with Cy. We show that attractiveness of Ly stems 
from its symmetries. In Sec. lIVI we extend Ly and show that, under reasonable assumptions, 
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the extension of Ly can be parameterized by a single parameter. In Sec.|Swe explain how we 
determined the optimal value of the parameter using the maximization principle of mutual 
information. In Sec. IV11 we determine the optimal value numerically. In Sec. IV11| we 
describe our theory, developed using a Gaussian approximation, for explaining the results. 
In Sec. IV1111 we discuss two non-stationary cases. 

II. GENERATING SPIKE TRAINS WITH DIFFERENT RANDOMNESS 

In this section, we explain how to generate spike trains with the same firing rate but 
different randomness. 

There are many ways to generate spike trains artificially. For example, we can generate 
spike trains by using a network of spiking neuron models. However, we do not need to 
describe precise spike timing here, and a simple mechanism is desirable. Therefore, we 
assume that the mechanism is a renewal process and that the inter spike interval (ISI) 
follows a gamma distribution [6], which is described as 



where T denotes an ISI. We generate ISIs from the distribution and align them to make a 
spike train. The mean and variance of the ISIs are 



The mean firing rate is obtained by taking the inverse of the mean ISI jl3l |. The k is a 
shape parameter; k = 1 corresponds to an exponential distribution, and, as k increases, the 
distribution approaches a normal distribution. The exponential distribution corresponds to a 
Poisson process in which the firing rate (hazard function) is constant with time independent 
of the previous firing time. The spike train looks random. As k increases, the variance of 
the ISIs decreases, and the ISIs become regular. 

Our goal is to find an optimal measure for discriminating two spike trains with different 
randomness independent of their mean firing rates. A gamma distribution is suitable for 
that purpose. First, we can control the mean firing rate and randomness independently by 
changing the two parameters (/i and k) in the distribution. Next, experimental data can 




(1) 



Ex(T) = fi 
Var(T) = £ 



(2) 
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be well fitted by the distribution. For example, Baker et al. showed that the spike pat- 



terns recorded 
distribution 



14- 



rom primary and supplementary motor areas are explicable using a gamma 



We can transform the parameters in a gamma distribution arbitrarily. For example, we 
can transform the parameters into (a, A): 

- - :> (3, 

The gamma distribution in this coordinate can be written as 

p(T) = j^T-e- (4) 

where A is a scale parameter. The mean and variance of the ISIs can be written as functions 
of a and A as 

[ Ex(T) = s 

(5) 

[ Var(T) = f . 

Thus, there are many ways of writing (parameterizing) a gamma distribution. We used the 
expression shown as Eq. (JTJ) because \i corresponds to the mean ISI and k is orthogonal to it 
in the sense of information geometry The proof is shown in APPENDIX El We call 

the parameters of a gamma distribution coordinates because we regard the family of gamma 
distributions as manifold. We would like to define randomness as information orthogonal to 
the firing rate. Therefore, we regard k as randomness in what follows. We generate spike 
trains having different intrinsic randomness by using the gamma distributions with different 
values of k. 

III. Ly AND C v 



The measure of local variation proposed by Shinomoto et al. ICt] is defined as 

L - 1 V 3 ^ - (6) 



where Tj denotes the i-th ISI in a spike train. The coefficient "3" is multiplied so that Ly 
is 1 for a Poisson process. Ly is large when consecutive ISIs differ. It is dimensionless and 



invariant if all the ISIs are multiplied by a constant. The conventional Cv is defined as 
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FIG. 1: Ly for doubly stochastic gamma process with various values of time constant r and rate 
amplitude A. 

Next we examine the difference between Ly and Cy and calculate Ly and Cy for the rate 
modulated gamma process. 

We define a rate modulated gamma process as an extension of a gamma distribution 
where the firing rate, A(t)(= ^y), is time-dependent while k is time- independent. The 
spikes for the rate modulated gamma process are generated as follows Note that we 

consider only the case of integer k. A spike is generated with probability X(t)dt for every 
small time step, dt. To be precise, we generate a uniform random number and if it is less 
than X(t)dt, we generate a spike at that time step. For the case where k is larger than 1, 
we keep every n-th spike and remove the others. What is left is the desired sequence. In 
fact, for the case where A is constant over time, the spike sequence generated in this way is 
equivalent to that generated from a renewal gamma distribution with fi = \. 

Here we consider a doubly stochastic gamma process whose firing rate obeys the Ornstein- 
Uhlenbeck process jl^. We assume the firing rate, A, satisfies 



d\ A — An A 2 . 



(8) 



where £ is Gaussian white noise, < £(£) >= 0, and < £ (£),£(£') >= S(t — t'). 

Fig. ^ and Fig. El show Ly and Cy with Ao = 1 for various values of time constant r and 
rate amplitude A. For simplicity, we consider sufficiently long spike sequences and assume 
that the values of Ly and Cy converge. Fig.^shows that in the limit of a large time constant, 




FIG. 2: Cy for doubly stochastic gamma process with various values of time constant r and rate 
amplitude A. 

the values of Ly converge to the value for the stationary case. This means that the value of 
Ly does not depend on the amplitude of the firing rate and has one-to-one correspondence 
with k in this limit. Fig. shows that Cy depends on both k and A and does not have 
one-to-one correspondence with k. Therefore, Ly is better than Cy for discriminating the 
intrinsic randomness of spike sequences. 

This attractive property seems to stem from the fact that Ly is the sum of the dimen- 
sionless terms of consecutive interspike intervals. By "dimensionless" we mean that the 
numerator and denominator have the same dimension. Every term in Ly is normalized 
locally by the average of two consecutive interspike intervals instead of the global average. 
Intuitively, because the firing rates for two consecutive interspike intervals can be regarded 
as the same in the slow limit, terms should be the same as those for the stationary case. On 
the other hand, Cy is the variance around the global mean of the ISIs and can be large for 
both the case where the firing rate fluctuates significantly and the case where the intrinsic 
randomness is large. Therefore, we cannot distinguish the two cases based on the value of 

Cy. 
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IV. MEASURE OF LOCAL VARIATION 



We extend Ly without losing its attractive property described in the previous section and 
find a better measure of intrinsic randomness. We do this by focusing on the ISI statistics 
and imposing three symmetry conditions: (1) time translation invariance, (2) time-scale 
transformation invariance, and (3) time inversion invariance. 

We assume the randomness of a spike train is constant over time and define the extended 
Ly as 

j n— 1 

i=i 

where Ti,T 2 , ...T n are the observed ISIs and f(x,y) does not depend on i explicitly. This 
form guarantees invariance under time translation (i — > i + 1) if n is infinite. Next, we 
assume that / is invariant under the time-scale transformation ( T — > kT). This requires 
that the denominator and numerator of / have the same dimension. For simplicity, we 
assume that the dimension is two, so / can be written as 

f( \ C!X 2 + c 2 xy + c 3 y 2 
C4X Z + c$xy + c$y z 

which includes the original Ly as a specific case. In addition, because we do not distinguish 
increases from decreases in the firing rate in terms of randomness, we impose time inversion 
invariance and require 

f(x,y) = f(y,x). (11) 

Thus, / can be written as 

) = £l£ ^w (12) 

c 4 x z + c 5 xy + c 4 y z 

Note that the absolute value of Ly does not matter in discriminant analysis, and we can 
add (or multiply by) a constant to /. Then, without loss of generality, / can be written as 

f(*,V)= 2-^— 2 - ( 13 ) 

x A + c$xy + y z 

In addition, we can rewrite the denominator using c = c 5 + 2: 

f(x,y) = 7 f- — • (14) 

[x - yy + cxy 

Because each term in the denominator is non-negative, the necessary and sufficient condition 
that the denominator always be positive is c > 0. 



- - k=2, A=0.5 

K=2, A= 1 




k=1,A=0.5 

K=1,A=1 



LO 

c> 




1e-01 



1e+00 



1e+01 



1e+02 1e+o; 



X 



FIG. 3: Ly(l) for doubly stochastic gamma process with various values of time constant r and 
rate amplitude A. 

As a result, Ly can be written as 



Note that the original Ly corresponds to the case of c = 4. In this way, the measures 
satisfying the symmetries have only one degree of freedom and can be parametrized by a 
single parameter. 

The Ly should have one-to-one correspondence to k like Ly because of its symmetries. 
In fact, it has the same values as those for the stationary case in the limit of a large time 
constant for the doubly stochastic gamma process. Fig. El shows that Ly(l) is independent 
of the rate amplitude, A, and is a function of k in the limit. The results for other values of 
c, for instance Ly(lQ), remain the same. 

Thus, Ly(c) has one-to-one correspondence with n. However, this is not sufficient to 
make it a good measure. We previously have considered only spike sequences with infinite 
length. However, in practical experimental situations, data sizes are limited, and Ly(c) 
varies widely by trial around the mean. Similarly, if spike sequences are generated using a 
gamma distribution, Ly(c) varies by trial for the finite spike sequence. In the discrimination 
of intrinsic randomness, roughly speaking, the smaller the variance, the higher the hitrate. 
Thus, we next search for an optimal value of parameter c, where the variance is the smallest. 




n-l 



(15) 
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V. MUTUAL INFORMATION MAXIMIZATION PRINCIPLE 

We use the mutual information maximization principle to determine an optimal measure. 
We assume that the firing rate is constant over time and spike sequences are generated by 
a gamma distribution, as shown in Sec. El As shown in Sec. Illl[ Ly does not depend on 
fi. Here we set fi = 1. We consider the stationary case because it is tractable and can be 
regarded as the slow change limit of the firing rate. We show in Sec. IVIIII that the optimal 
value of c for the nonstationary case does not differ significantly from that for the stationary 
case. 

The optimal parameter value is determined so as to maximize the mutual information 
between the coefficients and randomness. Here we assume that a spike train consists of 100 
ISIs because this is the typical length available from laboratory experiments. Ly can be 
computed for a spike train, and the value of Ly varies among spike trains. Even if spike 
trains are generated from the same distribution, the values of Ly can differ because the 
length of a spike train is finite. As a result, the distribution of Ly can be obtained for one 
parameter set of the gamma distribution. Thus, two distributions can be obtained from two 
types of spike trains. The mutual information can be computed from the two distributions. 
The bigger the mutual information, the better randomness (k) can be discriminated based 
on the observed Ly. 

Mutual information is calculated as follows. Spike trains are generated from two gamma 
distributions with equal probability, \. The two distribution have different k. All the ISIs 
in a spike train are generated by using the same distribution. We denote the distribution of 
Ly generated from the i{= 1, 2)-th gamma distribution as p{x\i); p(x)(= ^p(x\l) + |p(x|2)) 
represents the distribution of Ly with no distinction of the source. The entropy is defined 
as 




(16) 



The noise entropy is defined as 




(17) 



The mutual information is the difference, 



m 



H — H n . 



(18) 
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1e-01 1e+00 1e+01 1e+02 1e+03 
c 

FIG. 4: Mutual information with m = 1,^2 = 1.1- Open circle denotes peak. Dotted line is for 
c = 4 corresponding to original Ly. Mutual information has a peak with c larger than 4. 

The mutual information is the reduction in uncertainty about the spike trains due to the 
knowledge of Ly. Mutual information is if two distributions of Ly are identical so that 
they cannot be distinguished . Mutual information is 1 if two distributions of Ly have no 
overlap, and only one sample of Ly is needed to distinguish them. 

In the next section, we will show the results of a Monte Carlo simulation. We calculated 
mutual information as a function of c for various sets of randomness, K\ and 

VI. RESULTS 

Fig. |3] shows the mutual information with K\ = 1 and k 2 = 1.1; K\ and k 2 are the shape 
parameters of two gamma distributions and c is the parameter in Ly(c). We set the number 
of ISIs per spike train, n, to 100. The mutual information has a peak, whose location we 
denote by c pea k- The vertical line represents c = 4, which corresponds to the original Ly. 
Since c pea fc(~ 16) is bigger, the optimal coefficient in this case is not the original Ly but 
Ly (16). However, c pea k depends on various parameters, and we will examine how it depends 
on the number of ISIs per spike train, K\ and k 2 , in what follows. We can use the maximum 
likelihood estimator of k as a measure instead of Ly, and the peak value of the mutual 
information for k is 0.097. (For the maximum likelihood estimator, see Appendix El) The 
peak value for Ly is about 0.066, which is smaller than that for the maximum likelihood 
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FIG. 5: Mutual information for various numbers of ISIs per spike sequence with n\ = 1, «2 = 1.1. 
Open circles denote peaks. Dotted line is for c = 4 corresponding to original Ly. Peak location 
almost does not depend on number of ISIs. 

estimator. We nonetheless use Ly because the maximum likelihood estimator cannot be 
applied to the nonstationary case. In the cases where the firing rate is time-dependent, 
the mutual information for Ly can be much higher than that for the maximum likelihood 
estimator, as we will show in Sec. IVIIII 

Fig. El shows the mutual information for various numbers of ISIs per spike train. While 
the mutual information increases with the number of ISIs, the peak location remains almost 
the same. Although we show only the case for K\ = 1, K2 = 1-1, the other cases have similar 
results. Therefore, we set the number of ISIs per spike train to 100. 

Fig. IH1 shows the mutual information with K\ — 1 and various K2- As dn{= k>2 — Ki) in- 
creases, the mutual information approaches 1. The peak location remains almost unchanged 
{cpeak ~ 16). For k 2 = 3.2, the mutual information is almost 1, and the two distributions are 
completely distinguishable. In general, c pea k largely depends on K\ and is almost independent 
of /c 2 . 

Fig- HI shows the mutual information with K2 = 1.3ki and various K\. The peak location 
decreases with increasing K\. For Ki = 16, the original Ly is nearly optimal (c pea k ~ 4y / 2). 
Since reported experimental data can be well fitted by a gamma distribution with k ~ 16 
[14] , L v seems to be optimal not for the Poisson data but for the experimental data. 
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FIG. 6: Mutual information for various K2 with k\ = 1. Lines are for K2 = 0.1,0.2,0.4,0.8, 1.6 and 
3.2 from below. Open circles denote peaks. Dotted line is for c = 4 corresponding to original Ly. 
Peak location almost does not depend on ki- 



FIG. 7: Mutual information for various K\ with K2 = 1.3n±. Open circles denote peaks. Dotted 
line is for c = 4 corresponding to original Ly. Peak location decreases as n\ increases. 

VII. THEORETICAL ANALYSIS 

In this section we analyze the property of the mutual information theoretically. For 
simplicity, we do two approximations. 

First, we consider the limit of a large number of ISIs per spike train and approximate 
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the distribution of Ly by using the normal distribution. Although this approximation is 
not good for c ~ 0, the peak location is far larger than and can be discussed within this 
approximation. 

In addition, we consider the limit of small dn. In the limit, the mutual information can 
be written using the Fisher information ^| as 

Im = J (P( X > K ))<^ 2 , (19) 



where the Fisher information is defined as 



,dlogp(x, «) 2 



J(p(x, «)) = Ex(( > Y). (20) 

This relation can be easily derived. We represent two Ly distributions as 

pi ( x ) = 1 e -(x-m( K )r/2*(K)* ( 21 ) 



/ \ _ 1 p -{x-m{K+dK)) 2 /2a(K+dK) 2 , 22) 

'2™{k + <Ik) 2 ' V ' 



and 



Inserting these equations into the definition of the mutual information and expanding by dn 
to the second order lead to the relation. 

The Fisher information can be explicitly written as 

J= -'W'+yw. (23) 

Because a 2 is inversely proportional to N, a can be written as 

- = (24) 



'N 

The Fisher information can then be approximated as 

' " ^o(^) 2 N 

m'(n) 2 
ct (k) 2 ' 



(25) 



where m! and <jq depend on only k and c. As a result, the mutual information can be written 

as 

I m = l^P^Ndn 2 . (26) 
8 (t (k,c) 2 
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FIG. 8: Mutual information for monotonically decreasing firing rate for various r with K\ = 4 and 
k 2 = 5.2. 

Thus, I m is proportional to N and dn 2 . The c dependency of I m stems from only wl and 
o"o- Therefore, when N or dn changes, the absolute value of the mutual information changes 
while the peak location does not change. This is consistent with our numerical results in 
which the peak location did not depend on N and dn. The peak location can be explained 
by an interplay of ml and <jq. However, m and gq cannot be predicted solely by our theory. 
Numerical calculations are necessary for finding the peak location. 



VIII. NONSTATIONARY CASE 



We considered the discrimination of randomness for the stationary gamma process in the 
previous sections. However, it has been reported that experimental data can be explicable 
by the rate- modulated gamma process 1J|. Therefore, we consider the rate-modulated 
gamma process in this section. We show two simple cases in which the firing rate decreases 
monotonically or changes stepwise. 



A. monotonically decreasing firing rate 

We consider a simple rate-modulated case and show that the peak location of the mutual 
information, c pea k, tends not to change if the firing rate fluctuates significantly. We generate 
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the ISIs by again using a gamma distribution. We assume that the mean ISI increases 
monotonically. For simplicity, we set /ij = r l , where \L{ denotes the mean of the i-th ISI. We 
simply align n ISIs to make a single spike train as before. The value of k does not change 
within the train. The mutual information is calculated for two spike trains with different 
values of k. 

Fig.|H]shows the mutual information for K\ = 4 and k,2 = 5.2. The peak location decreases 
gradually from the stationary value as r increases. However, only extreme cases, in which 
the firing rates decrease more than 1.5 times one after another, are plotted. For realistic 
cases, c pea k changes only slightly. For example, the ratio between the last and first mean ISI 
is 

Ml 

and the ratio is 2.678033 for r = 1.01 and n = 100 and 12527.83 for r = 1.1 and n = 100. 
This illustrates that the 1.5 used for r is extremely large. Similar results were obtained for 
different values of k, so c pea k apparently tends not to change even if the firing rate fluctuates. 
This result is not restricted to the decreasing firing rate case. For example, the mean Ly 
remains the same if a small and a large mean ISI appear alternately instead of the firing rate 
increasing monotonically. It thus appears that the peak location of the mutual information 
is almost independent of the firing rate if the variation in the firing rate is small. 



B. stepwise changing firing rate 

So far we have considered only Ly{c). However, the maximum likelihood estimator, k, 
should be better for the stationary case. Here we consider the case of a stepwise changing 
firing rate to show why we favor Ly(c) nonetheless. In a word, k is not good for the 
nonstationary case because it is the maximum likelihood estimator for the stationary case, 
as shown in Appendix El In principle, the firing rate at every small time bin can be 
estimated for the nonstationary case. However, doing so requires many spike sequences and 
the firing rate profile must be the same for all the sequences. Therefore, it is not practical 
for many realistic cases. Instead we consider simple measures like Ly(c) and k even in the 
nonstationary case. In this section, we compare Ly and k for the nonstationary case. 

Consider the case in which the firing rate is stepwise increasing, as shown in the Fig. El 
At time t = 50, it shifts from 1 to A2. Two types of spike trains, with K\ = 16 and K2 = 20, 
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FIG. 9: Schematic diagram of stepwise increasing firing rate. Firing rate shifts from 1 to A2 at 
t = 50. 

are generated based on the firing rate profile. Fig. ITU1 shows the mutual information for these 
trains when Ly or k is used as a measure. The mutual information for Ly is independent 
of A2 in the limit of a large number of ISIs per train. The reason is that Ly is independent 
of the firing rate for the stationary case and in this case the firing rate is constant over 
time except for the discontinuous point. The contribution of the term in Ly that cross the 
discontinuous point is 0(l/n) and is small if the number of ISIs is large enough. We plotted 
the value for the stationary case, neglecting the contribution for simplicity. On the other 
hand, the mutual information for k decreases as A2 increases. For example, when the firing 
rate increases 1.5 times, the mutual information for Ly is larger than that for k. 

Thus, for a stepwise increasing firing rate, Ly is better than k. This type of sudden 
change can be observed when a visual stimulus is presented to a monkey at a given time. 
The result remains almost the same for the stepwise firing rate with multiple discontinuous 
points in the limit of a large number of ISIs. In addition, k depends on both k and the 
amplitude of the firing rate, as shown in Sec. |Tl]for Cy. Therefore, Ly is a better measure 
of intrinsic randomness. 
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FIG. 10: Mutual information for stepwise increasing firing rate with K\ = 16 and «2 = 20. 
IX. SUMMARY AND DISCUSSION 

In this study, we sought a measure more effective than the local variation of interspike 
intervals, Ly, in discriminating spike trains based on the degree of intrinsic spiking irregu- 
larity. 

We first compared characteristics of the conventional coefficient of variation, Cy, and the 
local variation, Ly. The coefficient of variation, Cy, measures a global variability of ISIs, 
and therefore depends on not only the local irregularity of ISIs but also the rate fluctuation, 
which would naturally manifest itself in in vivo neuronal spiking conditions. In contrast, 
the local variation, Ly, measures only a stepwise variability of ISIs, and therefore does not 
depends significantly on a rate fluctuation. It was revealed that Ly is superior to Cy in 
detecting some intrinsic spiking irregularity specific to individual neurons in vivo jlfll . u\ . 

For a spike train of a finite number of ISIs derived from a given point process, the value 
of Ly as well as Cy varies from trial to trial. The goodness of a coefficient is quantified 
by its narrow distribution of values among spike trains derived from the same point process 
and the small overlap of this distribution with the distribution obtained from spike trains 
derived from a different point process. In other words, we sought a new coefficient that 
maximizes the mutual information between spike sequences created from different renewal 
gamma processes. 

For this purpose, we adopted a rational expression of quadratic functions of consecutive 
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interspike intervals that is the same form as Ly, and searched for the optimal parameter of 
the coefficient. The optimal parameter of the coefficient depends on the choice of the point 
processes that are to be discriminated. It was found that the original Ly is not optimal for 
near random (Poisson) point processes, but is optimal for more regular spike trains. In this 
way, if we have preliminary knowledge of the spiking irregularities of the point processes, we 
are able to propose a better coefficient than the original Ly for the purpose of discriminating 
spike trains. 

We generated spike sequences entirely by using a stationary or rate-modulated gamma 
process. The reason is as follows. The Poisson process, in which the firing rate is represented 
as a function of time from stimulus onset, is widely used in spike data analysis 20(. However, 



the statistical properties o 



Poisson process 
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spike sequences cannot be fully captured by the rate-modulated 



2J]. In other words, spike probability depends on the past spike 



times due to the so-called refractory period. A gamma process is a Poisson process with 
an additional parameter representing a kind of refractory period. Baker et al. showed that 
the spike pattern recorded from primary and supplementary motor areas is explicable by a 
gamma process |l4(. 

We considered only mutual information as a measure for discriminating two spike trains. 
However, the Kullback-Leibler divergence D(pi,p2) is a well-known measure of the dis- 
similarity of two distributions, too. It is also proportional to the Fisher information, 
D = \ J(p(x, k))(1k 2 , under the same approximation as described in Sec. IV11I Note that 
the coefficient is \ instead of |, as seen in Eq. (TEJJ) . for the mutual information. However 
the coefficient is irrelevant to the peak location. Thus, the Kullback-Leibler divergence leads 
to the same results as mutual information. Nontheless, we used mutual information because 
it is symmetrical in terms of two distributions. The Kullback-Leibler divergence is not sym- 
metrical. Its value changes if the two distributions are interchanged. The Kullback-Leibler 
divergence becomes symmetrical in the limit of a small difference of two distributions, where 
it is proportional to the Fisher information. 

In previous studies, various measures were computed for mathematical models [l^,0,Q|- 
However, the focus was only on the expectations for the measures. In discrimination tasks, 
the variance of a measure is more important than the expectation. For example, consider the 
case in which the expectations of a measure for two different types of spike sequences differ 
considerably. If the variances are very large, discriminating the two sequences is difficult. In 
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addition, if the definition of a measure is changed, for example, multiplied or added to by 
a constant, the expectation changes, but the mutual information never changes. Therefore, 
in this paper we focused on the variance and searched for the measure that maximizes the 
mutual information. 



APPENDIX A: ORTHOGONAL COORDINATES FOR GAMMA DISTRIBU- 
TION 



We show that k and a are orthogonal coordinates in the sense of information geometry. 

nn 

The theory of informati on g eometry is described elsewhere [la, Ha], and there are applications 



to neuroscience 
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For the purpose of proving the orthogonality, it suffices to demonstrate that the Fisher 
information matrix is diagonal. The Fisher information matrix is defined as 

<91ogp(T) <91ogp(T) 



9ij 



-p(T)dT, 



where £ x = fi and £ 2 = k. The log-likelihood can be written as 

logp(T) = Klog(-) + (« — 1) logT - logr(ft) . 

The derivatives of the log-likelihood are 

aiogp(T) 



dfi 



K Tk 

- + — 

fi fl 2 



and 



d log p(T) k T 
= log - + 1 + logT - 1p(K) , 

OK fl fi 

where t/>(«) = (logr(fi;)) / . The matrix elements can be written as 

9mi 



K 

~2' 



(Al) 



(A2) 



(A3) 



(A4) 



(A5) 



1 

K 



(A6) 



(A7) 



20 



Thus, the Fisher information matrix is diagonal at every point. According to the theory of 
information geometry, it is always possible to choose orthogonal coordinates for an expo- 
nential family of distributions that includes the gamma distribution as a specific case. 

The Fisher information matrix has the meanings described below. When ji and k are 
estimated from a finite number of samples, the estimated values are not necessarily the 
same as the true value. The value of the maximum likelihood estimator varies depending 
on the sample sets, and its variation around the true value can be approximated by a 
normal distribution whose variance is the inverse of the Fisher matrix if the sample size is 
sufficiently large Q]. Thus, the diagonality of the Fisher matrix means that the variations 
in the maximum likelihood estimators of fi and k are uncorrelated. 



APPENDIX B: MAXIMUM LIKELIHOOD ESTIMATION FOR GAMMA DIS- 
TRIBUTION 

Let Ti,T 2 , ...,T„ be observed ISIs. We would like to estimate the true values of \i and k 
from them. The log-likelihood is defined as 

l = ln(p(T 1 )p(T 2 )...p(T n )) (Bl) 

and can be written as 

I = n«ln- -uT(k) + (K-l)^lnTi - ^2 T i- ( B2 ) 
The maximum likelihood estimators must satisfy both ^- — and -P- = 0. The derivatives 

J 0[l OK 

of the log-likelihood are 

dl K \ -v k . . 

7T = —J2 T *- n - 53 
dfx /r ^— ' n 

and 

dl Ik 

— = VlnTi > T t + n In- + n-mpU). (B4) 

Then, fi can be explicitly obtained as 



n 



l -H T - ( B5 ) 
and k must satisfy 

1 1 

- VlnTi-ln- y^Ti = iP(k) -Ink, (B6) 
n n 
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where ip(k) = (logT^))'. This equation cannot be solved explicitly for k. However, the 
right side of the equation is a monotonic function of k, and we can obtain k by numerical 
iteration. 

Instead of a lengthy numerical iteration, we can use the moment estimator. According 
to Eq. (J2J), we can estimate the true k from the sample mean and variance: 

Ex(T) 2 

= v^ry (B7) 

In fact, the right side of Eq. (|B7J) can be rewritten as ^4-. Thus, we can regard Cy as a 
moment estimator. However, the moment estimator is worse than the maximum likelihood 
estimator, especially when k is close to 1 [6j. Nevertheless, it is good as a first approximation, 
and we can use it as the initial value of the numerical iteration in maximum likelihood 
estimation. 

Another way to avoid numerical iteration is to use the left side of Eq. (jB6|) as a measure. 
In discriminant analysis, we do not need to estimate k because the left side of Eq. ()B6j) has 
one-to-one correspondence with k and has the same information as k. 
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